Shard Tor hidden services across multiple daemons #920
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When running all Tor hidden services (~30 total) from a single daemon, if we restart Tor, it can take a few hours for all the hidden services to come online.
This is especially bad for Umbrel users doing an update over Tor because the device will go down halfway through the update when Tor is restarted. It should come back online within an hour or two, but this isn't obvious, and to the user it appears as though the update has bricked their device.
Running multiple Tor daemons in parallel with a smaller number of hidden services to manage each seems to resolve the issue and allow the hidden services to come back online promptly.
For now we're running three instances of Tor,
tor
for internal Umbrel services which should come online almost instantly, andapp_tor
andapp_2_tor
which have about 10 hidden services each for all the install apps, and should come online within a few minutes.I'm not sure if this is a bug in Tor, or if there's a better way we should be doing this, but it seems like a single daemon should be able to manage this many hidden services. We should report this to the Tor developers and see if we can resolve this more cleanly.